The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and Its Application to Grammatical Error Correction
نویسندگان
چکیده
This paper introduces the freely available WikEd Error Corpus. We describe the data mining process from Wikipedia revision histories, corpus content and format. The corpus consists of more than 12 million sentences with a total of 14 million edits of various types. As one possible application, we show that WikEd can be successfully adapted to improve a strong baseline in a task of grammatical error correction for English-as-a-Second-Language (ESL) learners’ writings by 2.63%. Used together with an ESL error corpus, a composed system gains 1.64% when compared to the ESL-trained system.
منابع مشابه
JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction
We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents a broad range of language proficiency levels and uses holistic fluency edits to not only correct grammatical errors but also make the original text more native sounding. We describe the types of corrections made and bench...
متن کاملThe Effect of Focused Corrective Feedback and Attitude on Grammatical Accuracy: A Study of Iranian EFL Learners
Abstract The study aimed at investigating the efficacy of written corrective feedback (CF) in improving Iranian EFL learners’ grammatical accuracy. It compared the effects of focused and unfocused written CF on the learners’ grammatical accuracy. 75 EFL students formed a one control and two experimental groups. The focused feedback group was provided with error correction in tenses. The unfocus...
متن کاملThe Effect of Focused Corrective Feedback and Attitude on Grammatical Accuracy: A Study of Iranian EFL Learners
Abstract The study aimed at investigating the efficacy of written corrective feedback (CF) in improving Iranian EFL learners’ grammatical accuracy. It compared the effects of focused and unfocused written CF on the learners’ grammatical accuracy. 75 EFL students formed a one control and two experimental groups. The focused feedback group was provided with error correction in tenses. The unfocus...
متن کاملAutomatically Classifying Edit Categories in Wikipedia Revisions
In this paper, we analyze a novel set of features for the task of automatic edit category classification. Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document. Our features are based on differences between two versions of a document including meta data, textual and language properties and markup. In a supervised machin...
متن کاملAn Explicit Feedback System for Preposition Errors based on Wikipedia Revisions
This paper presents a proof-of-concept tool for providing automated explicit feedback to language learners based on data mined from Wikipedia revisions. The tool takes a sentence with a grammatical error as input and displays a ranked list of corrections for that error along with evidence to support each correction choice. We use lexical and part-of-speech contexts, as well as query expansion w...
متن کامل